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1 Introduction 

In the heart of the WCET analysis lies the pipeline analysis. The pipeline analysis 
simulates the execution of a task/program based on a timing model of the executing 
processor. This timing model operates at the granularity of processor cycles. Because 
of abstractions, this simulation is non-deterministic. To be safe, the analysis has to fol- 
low all possible paths. The analysis cannot follow only the locally worse paths because 
of the presence of timing anomalies. Reineke and Sen presented in [RS09] a formal 
method to safely discard the non-local-worst-case branches. This is accomplished by 
precomputing a function for all pairs of possible processor states. Unfortunately, for 
processor models of reasonable complexity, this precomputation is intractable. 

In this work, we propose a formalism to specify processor models in a way that 
facilitates deriving properties on them that are valid for all program sequences. This 
formalism is based on the fact that processors consist of different interacting units. 
In a reasonably sophisticated processor, a unit remains idle in a significant number 
of processing cycles during program execution, either because it is awaiting input(s) 
from other unit(s), or because the buffer to which it outputs is full. Any cycle-level 
simulation (or more formally, activity-based simulation) is bound to represent these idle 
unit states, resulting in a large state space. Our formalism relies on the mechanism of 
event-based simulation of the program execution in the pipeline. Using this formalism, 
the analysis keeps track only of the unit states where operations are accomplished, 
potentially making the processor model tractably analyzable. 

2 Formalism 

Pipeline analyses only simulate the timing behavior of the processor, other details are 
irrelevant. That is, it only matters how long some unit takes to perform an operation, 
not how it actually performs the operation. Using this intuition, units in our formalism 
are generic modules that consume tokens, process them for a certain amount of cycles, 
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then produce tokens. The interaction between different units is formalized in the next 
section. 

2.1 Producer-Buffer-Consumer State 

A Producer-Buffer-Consumer state represents two units, a producer and a consumer, 
communicating through a bounded buffer. Because we adopt event-based simulation 
mechanism, producer and consumer can operate at different time points, therefore the 
state keeps track of the difference between the times where each of them started the 
operation (i.e. time of start or ts): 

Definition 1. (Producer-Buffer-Consumer state). 

A PBCState CZxNisa tuple (Ats, buffer-size) where: 

• Ats := consume r.ts - producer.ts, and 

• buffer-size := number of elements in the buffer between producer and consumer 
at the time point min(producer.ts, consumer.ts) 

o 

During simulation, an s G PBCState is updated according to the following def- 
inition, the term delay here refers to the amount of time needed by a unit to complete 
an operation: 

Definition 2. (PBCState update). 

PBCState-update : PBCState xNxN^ PBCState is defined according to the follow- 
ing function: 



function PBCState-update (s, p. delay, c. delay): 

let p.ts=0, c.ts=s.Ais, p.ts'=p.ts+p.delay, c.ts'=c.ts+c. delay 

find p.ts', c.ts', Abuffer-size according to the following table: 
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return PBCState ( (c.ts'-p.ts'), s.buffer-size+Abuffer-size ) 



o 

In the table in Definition 2, the first case specifies the instance where the producer 
has to stall for a time greater than its delay, because the output buffer is full. The func- 
tion producer-stall-time specifies the amount of time it takes the producer to complete 
an operation, i.e. to produce a new token: 
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Definition 3. (Producer stall time). 

The function producer-stall-time : PBCState 

producer-stall-time (s, p. delay, c. delay) := I 



xNxN^Nis defined as follows: 
s.Ats fist case holds in Definition 2 
p.delay otherwise 



2.2 Processor Models and States 

Definition 4. (Processor model). 

A processor model M. is a tuple (ULab, connections, delays), where: 

• ULab := the set of all unit labels 

• connections C ULab x ULab xN := (ul, u2, buffer-capacity) 

• delays : ULab — > N + 

Additionally, we define the following functions: 

• entry-unit (M.) := the label of the unit where instructions enter the pipeline, 
restrictions: fan-in=0 and fan-out=l 

• exit-unit (M.) := the label of the unit where instructions go after retiring, re- 
strictions: fan-out=0 and unbounded input buffer(s) 

• unit-level (M, ULab) := the distance to exit-unit(M), e.g. the unit level of the 
exit unit is zero. 

• retire-units(M) := {u G ULab: unit-lev el(M., u) = 1} 

o 

Definition5. (Processor state). 

The set of processor states PState of a processor whose model is M. is a tuple (pou, 
udm), where: 

• pou: A4. connections — > PBCState (pairs of units) 

• udm: ULab — ► N (unit delay map) 
Additionally, we define the following functions: 

• buffer-occupancy-at-level (M, ps <G PState, I) : = 
J2{PBC-update(ps.pou(c), ps.udm(ul), ps.udm(u2)).buffer-size: c=(ul, u2, CA- 
PACITY) A unit-level(M, ul)=l} 

• producer-stall-at-level (A4, ps € PState, I, p.delay, c.delay) := 
{producer-stall-time(ps.pou(c), ps.udm(ul), ps.udm(u2)): c=(ul, u2, CAPAC- 
ITY) A unit-level(M, ul)=l} 

o 

The update function of a processor state is based on the function PBCState-update 
defined in the previous section. 
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Figure 1: The reachability graph of a simple processor. 

2.3 An Example Processor 

To illustrate how the simulation proceeds in our formalism, the reachability graph of a 
simple processor model is depicted in Figure 1. The processor consists of an instruction 
unit i and a simple function unit sfx. The instruction unit in our model is very fast 
(i.e. always instruction cache hit), it fetches a new instruction into the pipeline every 
cycle. The function unit is not as fast, it takes 16 cycles to process an instruction and 
pass it on to the SINK. The reachability graph is based on a total order on processor 
states. This order is formalized in the following section. 
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2.4 The Total Order on Processor States 



Specifying the processor state in terms of producer-buffer-consumer states allows for 
defining a total order relation on them. The relation -< is defined as follows: 

Definition 6. ( Ordering relation on processor states). 

For processor states si, s2 G PState of a processor whose model is A4, si -< s2 (read 
si is less performant than s2) according to the following function: 
si < s2 := A si, s2 . 
for I in 1 .. nlevels(M) do 

if I > 1 A buffer-occupancy-at-level(M, si, I) < 
bujfer-occupancy-at-level(M, s2, I) then 

true 
end if 

for tl in sorted{producer-stall-at-level(M., si, Ij) 
t2 in sorted(producer-stall-at-level(M, s2, I)) do 
if tl > t2 then 

true 
end if 
end for 
end for 
false 

o 

The ordering relation is based on the fact that a state with a full buffer is more 
performant than one with an empty buffer. The significance of a buffer depends on 
its distance to the exit unit, e.g. the buffers indicating progress the most are the ones 
connecting exit unit to the retire units. These unbounded buffers are not considered 
though in the ordering relation since they cannot cause any stall in the pipeline. If two 
states have the same buffer size valuation at one level, one state is more performant 
than other if it will produce a new token in the buffer before the other, hence is the 
checking with the function producer-stall-at-level. 

3 Discussion and Future Work 

In this work we have introduced a novel formalism of modeling pipeline analysis where 
we adopt an event-based simulation mechanism. The formalism allows for a uniform, 
systematic development of processor states. The simulation proceeds at the speed of 
instruction semantics rather than the speed of cycle semantics. The formalism allows 
for an intuitive total ordering relation between processor states. The formalism has a 
potential for reducing the size of the reachability graph, and consequently facilitating 
the derivation of static properties on the analyses, e.g. the function used to discard 
timing anomalies in [RS09]. 

The processor models considered so far are simple, and the method is to be ex- 
tended for representing real-world processors (e.g. model processors accepting differ- 
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ent classes of instructions, model units consuming from multiple units, rather than a 
single one). 

References 

[RS09] Jan Reineke and Rathijit Sen. Sound and efficient WCET analysis in the pres- 
ence of timing anomalies. In Proceedings of 9th International Workshop on 
Worst-Case Execution Time (WCET) Analysis, June 2009. 



6 



